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Summary. Phylogenetic trees constructed using predicted amino acid sequences 
of putative proteins of coronavirus HKU1 (CoV-HKU1) revealed that CoV-HKU1 
formed a distinct branch among group 2 coronaviruses. Of the 14 trees from p65 
to nsp10, nine showed that CoV-HKU1 was clustered with murine hepatitis virus. 
From nsp11, the topologies of the trees changed dramatically. For the eight trees 
from nsp11 to N, seven showed that the CoV-HKU1 branch was the first branch. 
The codon usage patterns of CoV-HKU 1 differed significantly from those in other 
group 2 coronaviruses. Split decomposition analysis revealed that recombination 
events had occurred between CoV-HKU1 and other coronaviruses. 


Introduction 


It has been estimated that coronaviruses [human coronaviruses 229E (HCoV- 
229E) and OC43 (HCoV-OC43)] cause about 5—30% of respiratory tract infec- 
tions. In late 2002 and 2003, Severe Acute Respiratory Syndrome (SARS), caused 
by SARS coronavirus (SARS-CoV), has resulted in more than 750 deaths [12, 15, 
16, 17, 22-24]. In early 2004, a novel coronavirus associated with respiratory 
tract infections, human coronavirus NL63 (HCoV-NL63), was discovered [3, 20]. 
As aresult of a unique mechanism of viral replication, coronaviruses have a high 
frequency of recombination [9, 10, 13, 14]. 
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Coronaviruses were divided into three groups, with HCoV-229E and HCoV- 
NL63 being group | coronaviruses and HCoV-OC43 a group 2 coronavirus respec- 
tively [11]. For SARS-CoV, it was initially proposed that SARS-CoV constituted a 
distinct group of coronavirus [15, 17]. However, after more extensive phylogenetic 
analysis, it was discovered that SARS-CoV probably represents a distant relative 
of group 2 coronaviruses [2, 18]. Further in silico analysis also predicted that 
SARS-CoV could be a product of recombination between mammalian and avian 
coronaviruses [19]. 

Recently, we have described the discovery of a novel coronavirus associated 
with pneumonia, coronavirus HKU1 (CoV-HKU1) [21]. Based on analysis of the 
putative chymotrypsin-like protease (3CL?"°), RNA-dependent RNA polymerase 
(Pol), helicase, hemagglutinin-esterase (HE), spike (S), envelope (E), membrane 
(M) and nucleocapsid (N), CoV-HKUI1 is a member of group 2 coronaviruses. 
However, the origin of CoV-HKU1 is still unknown. In this study, we performed 
a detailed phylogenetic analysis of CoV-HKU1. Possible recombination events 
were predicted and the origin of CoV-HKU1 discussed. 


Materials and methods 


The predicted amino acid (a.a.) sequences of p65, conserved portions of nsp1 [papain-like 
protease 1 (PL1?"°), Appr-1-p processing enzyme family (Alpp), papain-like protease 2 
(PL2P"°), hydrophobic domain 1 (HD1), and hydrophobic domain 2 (HD2)], nsp2—7, nsp9-13, 
HE, S, E,M and N were extracted from the CoV-HKU1 genome sequence (GenBank accession 
no. AY597011) [21]. The corresponding a.a. sequences of murine hepatitis virus (MHV), 
HCoV-OC43, bovine coronavirus (BCoV), porcine hemagglutinating encephalomyelitis virus 
(PHEV), rat sialodacryoadenitis coronavirus (SDAV) and puffinosis virus (PV) were extracted 
from complete genome sequences of MHV (GenBank accession no. AF201929), HCoV-OC43 
(GenBank accession no. AY585229) and BCoV (GenBank accession no. NC_003045), and 
sequences of PHEV, SDAV and PV available in GenBank. The a.a. sequence of HE of MHV 
was extracted from MHV strain JHM (GenBank accession no. BAA00661) because the HE 
gene in MHV (GenBank accession no. AF201929) stopped prematurely after the 97th a.a. 
Phylogenetic tree construction was performed using neighbour joining method with ClustalX 
1.83. The corresponding a.a. sequences of HCoV-229E were used as outgroups, except for 
p65 and HE because these were not available in the genome of HCoV-229E. For p65 and 
HE, the corresponding a.a. sequences in SARS-CoV and influenza C virus were used as the 
outgroups respectively. Phylogenetic trees were not constructed for p28 and the predicted 
hypothetical protein of ORF4 and ORF8 in CoV-HKU1 because no a.a. sequences that can 
be used as the appropriate outgroups can be found. 

The amino-terminal 800 a.a. residues of the S proteins in various group 1 coronaviruses 
[porcine transmissible gastroenteritis virus (TGEV), HCoV-NL63 and HCoV-229E], various 
group 2 coronaviruses (PHEV, SDAV, MHV, HCoV-OC43 and BCoV), infectious bronchitis 
virus (IBV) (a group 3 coronavirus), SARS-CoV and CoV-HKU 1 were aligned using ClustalX 
1.83. The presence and positions of conserved cysteine residues in the various peptides were 
compared. 

Correspondence analysis was used to compare the codon usage pattern variation in the 
different genes among group 2 coronaviruses in a multidimensional space [5]. All available 
sequences of ORF lab, HE, S, M and N of MHV, HCoV-OC43, BCoV, PHEV, SDAV, PV and 
SARS-CoV were downloaded from the GenBank (Table 1). Analysis of codon usage in these 
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sequences and the corresponding ones in CoV-HKUI1 was performed using CodonW 
(http://www.molbiol.ox.ac.uk/cu/), with each gene represented as a 59 dimensional vector, 
representing the 59 possible sense codons. AUG, the only codon for methionine, UGG, the 
only codon for tryptophan, and the three stop codons were excluded. The ORF for E was 
excluded because the length of the gene was too short. 

To delineate the importance of recombination on the evolution of CoV-HKU 1, split decom- 
position analysis was performed. Deduced a.a. sequences of group 1, 2 and 3 coronaviruses 
and SARS-CoV available in GenBank, that were homologous to 3CLP"®, Pol, helicase, HE, 
S, ORF4, E, M and N in CoV-HKUI1 [21], were retrieved. Split decomposition analysis was 
performed with SplitsTree version 3.2 [7] using Hamming correction and is presented with 
the same edge length. 


Results 


The genome organizations of CoV-HKUI1 and other group 2 coronaviruses were 
shown in Fig. la. Phylogenetic trees using predicted a.a. sequences of putative pro- 
teins and polypeptides of CoV-HKU1 and other group 2 coronaviruses were con- 
structed (Fig. 1b). The putative proteins and polypeptides included p65, conserved 
portions of nspl (PLIP°, Alpp, PL2?°, HD1 and HD2), nsp2-7, 
nsp9-13, HE, S, E, M and N. All trees revealed that CoV-HKU1 formed a distinct 
branch among group 2 coronaviruses. Interestingly, of the 14 trees of p65 to nsp10, 
nine (64%) (p65, HD1, HD2, nsp3, nsp4, nsp6, nsp7, nsp9 and nsp10) showed 
that CoV-HKUI1 was clustered with MHV (Fig. 1b). However, for the eight trees 
of nsp11 to N, seven (88%) showed that the CoV-HKU1 branch appeared as the 
first branch among group 2 coronaviruses (Fig. 1b). 

Comparison of the cysteine residues in the N-terminal 800 a.a. residues of S 
in CoV-HKU1 and those in the different groups of coronaviruses revealed that 
almost all the conserved cysteine residues in group 2 coronaviruses were present 
in CoV-HKUI (Fig. 2a), supporting that CoV-HKUI1 is a member of group 2 
coronaviruses. 

The number of ORF lab, HE, S, M and N sequences in the group 2 coron- 
aviruses used for correspondence analysis is shown in Table 1. The results of the 


IY 
Fig. 1. Genome organization and phylogenetic analysis of CoV-HKUI. a Genome 
organization of CoV-HKU1 (GenBank accession no. AY597011), MHV (GenBank accession 
no. AF201929), HCoV-OC43 (GenBank accession no. AY585229) and BCoV (GenBank 
accession no. NC_003045). The homologous regions used for phylogenetic analysis were 
shaded. b Phylogenetic analysis of p65, conserved portions of nsp1] (PL1°"°, Alpp, PL2P*, 
HD1 and HD2), nsp2—7, nsp9—13, HE, S, E, M and N in group 2 coronaviruses. The trees were 
constructed by neighbour joining method using Jukes-Cantor correction and bootstrap values 
calculated from 1000 trees. 578, 204, 107, 212, 421, 496, 303, 287, 89, 197, 110, 137, 928, 
595, 521, 374, 299, 424, 1287, 84, 226 and 445 a.a. positions in p65, PL1?™, Alpp, PL2P"°, 
HD1, HD2, nsp2, nsp3, nsp4, nsp5, nsp6, nsp7, nsp9, nsp10, nsp11, nsp12, nsp13, HE, S, 
E, M and N respectively were included in the analysis. The scale bar indicates the estimated 
number of substitutions per 5 or 10 a.a. as indicated. The corresponding a.a. sequences of 
HCoV-229E were used as the outgroups, except for p65 and HE, for which the corresponding 
a.a. Sequences in SARS-CoV and influenza C virus were used as the outgroups respectively 
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2304 P. C. Y. Woo et al. 


IBV om a. 


Group 2 


ee ae | 

i ee sae ase sh aa 
“esvoces WV _\_W VAW_Y UYU 
aay a7/ 

ae 4 AL TTT TOM TT TT 


1 100 200 300 400 500 600 700 


¢ CoV-HKU1 


Group 2: orflab, HE, S, M 


orfiab 
HE, 


Axis 2 ( 19.3% ) 
=) 


Ss? a 
0.3 fy 
e 
-0.4 Group 2: N 
-0.5 aa 
-06 -0.4 -0.2 0.0 0.2 0.4 0.6 08 
Axis 1 ( 36.6%) b 


Fig. 2. Analysis of cysteine positions in the N-terminal 800 a.a. residues of S and codon 

usage patterns of CoV-HKU1. a Schematic representation of cysteine positions (#) in the 

N-terminal domain of S in CoV-HKU1 in comparison with those in other coronaviruses. 

Conserved cysteine residues of S in different coronaviruses are joined by solid lines. The bar 

indicates the a.a. residue positions on S. b A scattered plot of the scores for the codon usage 

patterns of ORF lab, HE, S, M and N in MHV, HCoV-OC43, BCoV, PHEV, SDAV, PV and 
CoV-HKU1 on the first and second axis 


correspondence analysis with respect to axis | and 2 are shown in Fig. 2b. Axis 1 
and 2 explained 36.6% and 19.3% of the variations in codon usage respectively. 
For ORF lab, HE, S and M, the scores on axis | in group 2 coronaviruses other 
than CoV-HKU 1 were clustered between —0.16 and 0.28 and those in CoV-HKU1 
were clustered between —0.40 and —0.24 (Fig. 2b). For N, the scores on axis | 
in group 2 coronaviruses other than CoV-HKU1 were clustered between 0.48 and 
0.57 and that in CoV-HKU1 was at 0.11 (Fig. 2b). These indicated that the codon 
usage patterns in the genes in CoV-HKU1 differed significantly from those in 
other group 2 coronaviruses. 
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Table 1. Number of ORF lab, hemagglutinin-esterase (HE), spike (S), membrane (M) and 
nucleocapsid (N) sequences in the various groups of coronaviruses used for correspondence 


analysis 
ORF No. of sequences used* 
MHV HCoV- BCoV  PHEV SDAV_ PV_ SARS-CoV CoV- 
OC43 HKU1 

ORF lab fi 3 4 0 0 0 2 1 
HE 3 3 8 2 1 1 0 1 
S 12 3 9 p) 1 0 2 1 
M fi 3 6 2 1 0 2 l 
N 11 3 7 2 1 1 2 1 


*“HCoV-OC43, human coronavirus OC43; MHV, murine hepatitis virus; BCoV, bovine 
coronavirus; SDAV, rat sialodacryoadenitis coronavirus; PHEV, porcine hemagglutinating 
encephalomyelitis virus; PV, puffinosis virus; SARS-CoV, SARS coronavirus; CoV-HKU1, 
human coronavirus HKU1 


Split decomposition analysis revealed that recombination events had occurred 
between CoV-HKU1 and other group 2 coronaviruses in 3CLP"°, Pol, helicase, 
HE, S, ORF4, Eand M (Fig. 3). No evidence of recombination was shown between 
the N of CoV-HKUI1 and those of other group 2 coronaviruses. 


Discussion 


CoV-HKU1 is a distinct member of group 2 coronaviruses. It was confirmed by 
both phylogenetic analysis of 22 protein coding regions (Fig. 1b) and analysis of 
the conserved cysteine residues in the amino-terminal of the S proteins (Fig. 2a) 
that CoV-HKU1 is a group 2 coronavirus. Furthermore, phylogenetic analysis of 
the 22 protein coding regions revealed that there were 10-54% a.a. differences 
between a particular protein coding region in CoV-HKU]1 and the corresponding 
region in the most closely related sequence, indicating that CoV-HKU 1 1s distinct 
from the other group 2 coronaviruses. This fact was further supported by results 
of correspondence analysis of codon usage (Fig. 2b). 

Recombination events were common among CoV-HKU1 and other group 2 
coronaviruses. Coronaviruses have high frequency of homologous RNA recom- 
bination, which has been observed in both tissue culture [10, 14] and experi- 
mentally infected animals [8]. In split tree analysis, recombination events would 
result in reticulations instead of simple branching structures. As shown in Fig. 3, 
recombination was particularly frequent in CoV-HKU1 and MHV as compared to 
other group 2 coronaviruses such as BCoV and HCoV-OC43. The particular high 
recombination frequency in MHV [1] is in line with evidence of a lot of inter- 
strain recombination, as shown by the high number of reticulations in various 
ORFs of the different MHV strains (Fig. 3). Complete genome sequencing of 
additional CoV-HKU 1 and further split tree analysis would shed light on whether 
CoV-HKU1 behaves more like MHV or BCoV and HCoV-OC43. 
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CoV-HKU1 may have originated from a major recombination event and nu- 
merous minor recombination events among group 2 coronaviruses. In feline coro- 
navirus, the site of recombination has been pinpointed to a region of about 50 
nucleotides in the M gene by multiple alignment [6]. As for recombination between 
different strains of MHV, in vitro studies have shown both variable sites and 
rates of recombination, with the S gene have a frequency three fold that of 
the polymerase gene [4, 14]. In CoV-HKU1, nine of the 14 phylogenetic trees 
constructed using deduced a.a. sequences of p65 to nsp10 showed that CoV-HKU1 
was clustered with MHV (Fig. 1b). Interestingly, the topologies of the phylogenetic 
trees changed dramatically from nsp11. For the eight trees from nsp11 to N, seven 
revealed that the CoV-HKU1 branch appeared as the first branch among the group 
2 coronaviruses (Fig. 1b) (P < 0.01 by chi-square test). A logical explanation was 
that a major recombination event has taken place in the region between nsp10 and 
nspl1 when CoV-HKU1 first appeared. However, this recombination event was 
not evident in multiple alignment performed at the junction between nsp10 and 
nsp11 (data not shown). This is because although CoV-HKU1 is more clustered 
with MHV from p65 to nsp10, the difference in phylogenetic distances between 
CoV-HKUI1 and MHV and those between CoV-HKU1 and BCoV/HCoV-OC43 
is not marked (Fig. 1b), in contrast to what was observed in feline coronavirus 
[6]. Furthermore, bootscanning analysis in the whole genome did not reveal any 
putative recombination break point (data not shown). We speculate that this could 
be due to numerous minor recombination events between p65 and nsp10, such 
as between p65 and nsp1-PL1?"°, between nsp1-PL2?"° and nsp1-HD1, between 
nsp4 and nsp5, and between nsp5 and nsp6. This has resulted in CoV-HKU 1 being 
clustered with MHV in only nine of the 14 phylogenetic trees constructed using 
deduced a.a. from p65 to nsp10, but four of the 14 trees with the CoV-HKU1 
branch being the first branch among the group 2 coronaviruses. 
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